Goto

Collaborating Authors

 Colorectal Cancer


Bayesian Semiparametric Mixture Cure (Frailty) Models

Kızılaslan, Fatih, Vitelli, Valeria

arXiv.org Machine Learning

In recent years, mixture cure models have gained increasing popularity in survival analysis as an alternative to the Cox proportional hazards model, particularly in settings where a subset of patients is considered cured. The proportional hazards mixture cure model is especially advantageous when the presence of a cured fraction can be reasonably assumed, providing a more accurate representation of long-term survival dynamics. In this study, we propose a novel hierarchical Bayesian framework for the semiparametric mixture cure model, which accommodates both the inclusion and exclusion of a frailty component, allowing for greater flexibility in capturing unobserved heterogeneity among patients. Samples from the posterior distribution are obtained using a Markov chain Monte Carlo method, leveraging a hierarchical structure inspired by Bayesian Lasso. Comprehensive simulation studies are conducted across diverse scenarios to evaluate the performance and robustness of the proposed models. Bayesian model comparison and assessment are performed using various criteria. Finally, the proposed approaches are applied to two well-known datasets in the cure model literature: the E1690 melanoma trial and a colon cancer clinical trial.


Methodology for Comparing Machine Learning Algorithms for Survival Analysis

Cardoso, Lucas Buk, Angelo, Simone Aldrey, Bonilha, Yasmin Pacheco Gil, Maia, Fernando, Ribeiro, Adeylson Guimarães, Curado, Maria Paula, Fernandes, Gisele Aparecida, Parro, Vanderlei Cunha, Cipparrone, Flávio Almeida de Magalhães, Filho, Alexandre Dias Porto Chiavegatto, Filho, Victor Wünsch, Toporcov, Tatiana Natasha

arXiv.org Artificial Intelligence

This study presents a comparative methodological analysis of six machine learning models for survival analysis (MLSA). Using data from nearly 45,000 colorectal cancer patients in the Hospital-Based Cancer Registries of São Paulo, we evaluated Random Survival Forest (RSF), Gradient Boosting for Survival Analysis (GBSA), Survival SVM (SSVM), XGBoost-Cox (XGB-Cox), XGBoost-AFT (XGB-AFT), and LightGBM (LGBM), capable of predicting survival considering censored data. Hyperparameter optimization was performed with different samplers, and model performance was assessed using the Concordance Index (C-Index), C-Index IPCW, time-dependent AUC, and Integrated Brier Score (IBS). Survival curves produced by the models were compared with predictions from classification algorithms, and predictor interpretation was conducted using SHAP and permutation importance. XGB-AFT achieved the best performance (C-Index = 0.7618; IPCW = 0.7532), followed by GBSA and RSF. The results highlight the potential and applicability of MLSA to improve survival prediction and support decision making.


Deep Pathomic Learning Defines Prognostic Subtypes and Molecular Drivers in Colorectal Cancer

Wang, Zisong, Wang, Xuanyu, Chen, Hang, Wang, Haizhou, Chen, Yuxin, Xu, Yihang, Yuan, Yunhe, Luo, Lihuan, Ling, Xitong, Liu, Xiaoping

arXiv.org Artificial Intelligence

Precise prognostic stratification of colorectal cancer (CRC) remains a major clinical challenge due to its high heterogeneity. The conventional TNM staging system is inadequate for personalized medicine. We aimed to develop and validate a novel multiple instance learning model TDAM-CRC using histopathological whole-slide images for accurate prognostic prediction and to uncover its underlying molecular mechanisms. We trained the model on the TCGA discovery cohort (n=581), validated it in an independent external cohort (n=1031), and further we integrated multi-omics data to improve model interpretability and identify novel prognostic biomarkers. The results demonstrated that the TDAM-CRC achieved robust risk stratification in both cohorts. Its predictive performance significantly outperformed the conventional clinical staging system and multiple state-of-the-art models. The TDAM-CRC risk score was confirmed as an independent prognostic factor in multivariable analysis. Multi-omics analysis revealed that the high-risk subtype is closely associated with metabolic reprogramming and an immunosuppressive tumor microenvironment. Through interaction network analysis, we identified and validated Mitochondrial Ribosomal Protein L37 (MRPL37) as a key hub gene linking deep pathomic features to clinical prognosis. We found that high expression of MRPL37, driven by promoter hypomethylation, serves as an independent biomarker of favorable prognosis. Finally, we constructed a nomogram incorporating the TDAM-CRC risk score and clinical factors to provide a precise and interpretable clinical decision-making tool for CRC patients. Our AI-driven pathological model TDAM-CRC provides a robust tool for improved CRC risk stratification, reveals new molecular targets, and facilitates personalized clinical decision-making.


EndoSight AI: Deep Learning-Driven Real-Time Gastrointestinal Polyp Detection and Segmentation for Enhanced Endoscopic Diagnostics

Cavadia, Daniel

arXiv.org Artificial Intelligence

Precise and real-time detection of gastrointestinal polyps during endoscopic procedures is crucial for early diagnosis and prevention of colorectal cancer. This work presents En-doSight AI, a deep learning architecture developed and evaluated independently to enable accurate polyp localization and detailed boundary delineation. Leveraging the publicly available Hyper-Kvasir dataset, the system achieves a mean A verage Precision (mAP) of 88.3% for polyp detection and a Dice coefficient of up to 69% for segmentation, alongside real-time inference speeds exceeding 35 frames per second on GPU hardware. The training incorporates clinically relevant performance metrics and a novel thermal-aware procedure to ensure model robustness and efficiency. This integrated AI solution is designed for seamless deployment in endoscopy workflows, promising to advance diagnostic accuracy and clinical decision-making in gastrointestinal healthcare.


Colorectal Cancer Histopathological Grading using Multi-Scale Federated Learning

Arafath, Md Ahasanul, Ghosh, Abhijit Kumar, Ahmed, Md Rony, Afroz, Sabrin, Hosen, Minhazul, Moon, Md Hasan, Reza, Md Tanzim, Alam, Md Ashad

arXiv.org Machine Learning

Colorectal cancer (CRC) grading is a critical prognostic factor but remains hampered by inter-observer variability and the privacy constraints of multi-institutional data sharing. While deep learning offers a path to automation, centralized training models conflict with data governance regulations and neglect the diagnostic importance of multi-scale analysis. In this work, we propose a scalable, privacy-preserving federated learning (FL) framework for CRC histopathological grading that integrates multi-scale feature learning within a distributed training paradigm. Our approach employs a dual-stream ResNetRS50 backbone to concurrently capture fine-grained nuclear detail and broader tissue-level context. This architecture is integrated into a robust FL system stabilized using FedProx to mitigate client drift across heterogeneous data distributions from multiple hospitals. Extensive evaluation on the CRC-HGD dataset demonstrates that our framework achieves an overall accuracy of 83.5%, outperforming a comparable centralized model (81.6%). Crucially, the system excels in identifying the most aggressive Grade III tumors with a high recall of 87.5%, a key clinical priority to prevent dangerous false negatives. Performance further improves with higher magnification, reaching 88.0% accuracy at 40x. These results validate that our federated multi-scale approach not only preserves patient privacy but also enhances model performance and generalization. The proposed modular pipeline, with built-in preprocessing, checkpointing, and error handling, establishes a foundational step toward deployable, privacy-aware clinical AI for digital pathology.


MicroAUNet: Boundary-Enhanced Multi-scale Fusion with Knowledge Distillation for Colonoscopy Polyp Image Segmentation

Wang, Ziyi, Zhang, Yuanmei, Esrafilzadeh, Dorna, Jalili, Ali R., Xiang, Suncheng

arXiv.org Artificial Intelligence

Early and accurate segmentation of colorectal polyps is critical for reducing colorectal cancer mortality, which has been extensively explored by academia and industry. However, current deep learning-based polyp segmentation models either compromise clinical decision-making by providing ambiguous polyp margins in segmentation outputs or rely on heavy architectures with high computational complexity, resulting in insufficient inference speeds for real-time colorectal endoscopic applications. To address this problem, we propose MicroAUNet, a light-weighted attention-based segmentation network that combines depthwise-separable dilated convolutions with a single-path, parameter-shared channel-spatial attention block to strengthen multi-scale boundary features. On the basis of it, a progressive two-stage knowledge-distillation scheme is introduced to transfer semantic and boundary cues from a high-capacity teacher. Extensive experiments on benchmarks also demonstrate the state-of-the-art accuracy under extremely low model complexity, indicating that MicroAUNet is suitable for real-time clinical polyp segmentation. The code is publicly available at https://github.com/JeremyXSC/MicroAUNet.


Morphology-Aware Prognostic model for Five-Year Survival Prediction in Colorectal Cancer from H&E Whole Slide Images

Sajjad, Usama, Akbar, Abdul Rehman, Su, Ziyu, Knight, Deborah, Frankel, Wendy L., Gurcan, Metin N., Chen, Wei, Niazi, Muhammad Khalid Khan

arXiv.org Artificial Intelligence

Colorectal cancer (CRC) remains the third most prevalent malignancy globally, with approximately 154,000 new cases and 54,000 projected deaths anticipated for 2025. The recent advancement of foundation models in computational pathology has been largely propelled by task agnostic methodologies that can overlook organ-specific crucial morphological patterns that represent distinct biological processes that can fundamentally influence tumor behavior, therapeutic response, and patient outcomes. The aim of this study is to develop a novel, interpretable AI model, PRISM (Prognostic Representation of Integrated Spatial Morphology), that incorporates a continuous variability spectrum within each distinct morphology to characterize phenotypic diversity and reflecting the principle that malignant transformation occurs through incremental evolutionary processes rather than abrupt phenotypic shifts. PRISM is trained on 8.74 million histological images extracted from surgical resection specimens of 424 patients with stage III CRC. PRISM achieved superior prognostic performance for five-year OS (AUC = 0.70 +- 0.04; accuracy = 68.37% +- 4.75%; HR = 3.34, 95% CI = 2.28-4.90; p < 0.0001), outperforming existing CRC-specific methods by 15% and AI foundation models by ~23% accuracy. It showed sex-agnostic robustness (AUC delta = 0.02; accuracy delta = 0.15%) and stable performance across clinicopathological subgroups, with minimal accuracy fluctuation (delta = 1.44%) between 5FU/LV and CPT-11/5FU/LV regimens, replicating the Alliance cohort finding of no survival difference between treatments.


A Lightweight and Robust Framework for Real-Time Colorectal Polyp Detection Using LOF-Based Preprocessing and YOLO-v11n

Behzadi, Saadat, Sharifrazi, Danial, Mesbahzadeh, Bita, Joloudari, Javad Hassannataj, Alizadehsani, Roohallah

arXiv.org Artificial Intelligence

Objectives: Timely and accurate detection of colorectal polyps plays a crucial role in diagnosing and preventing colorectal cancer, a major cause of mortality worldwide. This study introduces a new, lightweight, and efficient framework for polyp detection that combines the Local Outlier Factor (LOF) algorithm for filtering noisy data with the YOLO-v11n deep learning model. Study design: An experimental study leveraging deep learning and outlier removal techniques across multiple public datasets. Methods: The proposed approach was tested on five diverse and publicly available datasets: CVC-ColonDB, CVC-ClinicDB, Kvasir-SEG, ETIS, and EndoScene. Since these datasets originally lacked bounding box annotations, we converted their segmentation masks into suitable detection labels. To enhance the robustness and generalizability of our model, we apply 5-fold cross-validation and remove anomalous samples using the LOF method configured with 30 neighbors and a contamination ratio of 5%. Cleaned data are then fed into YOLO-v11n, a fast and resource-efficient object detection architecture optimized for real-time applications. We train the model using a combination of modern augmentation strategies to improve detection accuracy under diverse conditions. Results: Our approach significantly improves polyp localization performance, achieving a precision of 95.83%, recall of 91.85%, F1-score of 93.48%, mAP@0.5 of 96.48%, and mAP@0.5:0.95 of 77.75%. Compared to previous YOLO-based methods, our model demonstrates enhanced accuracy and efficiency. Conclusions: These results suggest that the proposed method is well-suited for real-time colonoscopy support in clinical settings. Overall, the study underscores how crucial data preprocessing and model efficiency are when designing effective AI systems for medical imaging.


PVTAdpNet: Polyp Segmentation using Pyramid vision transformer with a novel Adapter block

Nezhad, Arshia Yousefi, Aghaei, Helia, Sajedi, Hedieh

arXiv.org Artificial Intelligence

Colorectal cancer ranks among the most common and deadly cancers, emphasizing the need for effective early detection and treatment. To address the limitations of traditional colonoscopy, including high miss rates due to polyp variability, we introduce the Pyramid Vision Transformer Adapter Residual Network (PVTAdpNet). This model integrates a U-Net-style encoder-decoder structure with a Pyramid Vision Transformer backbone, novel residual blocks, and adapter-based skip connections. The design enhances feature extraction, dense prediction, and gradient flow, supported by squeeze-and-excitation attention for improved channel-wise feature refinement. PVTAdpNet achieves real-time, accurate polyp segmentation, demonstrating superior performance on benchmark datasets with high mDice and mIoU scores, making it highly suitable for clinical applications. PVTAdpNet obtains a high Dice coefficient of 0.8851 and a mean Intersection over Union (mIoU) of 0.8167 on out-of-distribution polyp datasets. Evaluation of the PolypGen dataset demonstrates PVTAdpNet's capability for real-time, accurate performance within familiar distributions. The source code of our network is available at https://github.com/ayousefinejad/PVTAdpNet.git


DNABERT-2: Fine-Tuning a Genomic Language Model for Colorectal Gene Enhancer Classification

King, Darren, Atlasi, Yaser, Rafiee, Gholamreza

arXiv.org Artificial Intelligence

Gene enhancers control when and where genes switch on, yet their sequence diversity and tissue specificity make them hard to pinpoint in colorectal cancer. We take a sequence-only route and fine-tune DNABERT-2, a transformer genomic language model that uses byte-pair encoding to learn variable-length tokens from DNA. Using assays curated via the Johnston Cancer Research Centre at Queen's University Belfast, we assembled a balanced corpus of 2.34 million 1 kb enhancer sequences, applied summit-centered extraction and rigorous de-duplication including reverse-complement collapse, and split the data stratified by class. With a 4096-term vocabulary and a 232-token context chosen empirically, the DNABERT-2-117M classifier was trained with Optuna-tuned hyperparameters and evaluated on 350742 held-out sequences. The model reached PR-AUC 0.759, ROC-AUC 0.743, and best F1 0.704 at an optimized threshold (0.359), with recall 0.835 and precision 0.609. Against a CNN-based EnhancerNet trained on the same data, DNABERT-2 delivered stronger threshold-independent ranking and higher recall, although point accuracy was lower. To our knowledge, this is the first study to apply a second-generation genomic language model with BPE tokenization to enhancer classification in colorectal cancer, demonstrating the feasibility of capturing tumor-associated regulatory signals directly from DNA sequence alone. Overall, our results show that transformer-based genomic models can move beyond motif-level encodings toward holistic classification of regulatory elements, offering a novel path for cancer genomics. Next steps will focus on improving precision, exploring hybrid CNN-transformer designs, and validating across independent datasets to strengthen real-world utility.